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A fundamental goal of educational research is identifying students’ current stage of 
skill mastery (complete/partial/none). In recent years a number of cognitive diagnosis 
models have become a popular means of estimating student skill knowledge. However, 
these models become difficult to estimate as the number of students, items, and skills 
grows. There exist alternatives such as sum-scores and the capability matrix. While 
initial theoretical work on sum-scores has been done, the behavior of sum-scores and 
the capability matrix is not well understood with respect to each other or to estimates 
from cognitive diagnosis models. In this paper we compare the performance of the 
three estimates of student skill knowledge under a variety of clustering methods using 
simulated data with varying levels of missing values. 

1 Introduction 

A fundamental goal of educational research is identifying students’ current stage of 
skill mastery (complete/partial/none). In addition, finding groups of students with similar 
skill set profiles is important to provide feedback for classroom instmction. In recent years 
a number of cognitive diagnosis models [3,8] have become a popular means of estimating 
student skill knowledge. However, these models become difficult and time-consuming 
to estimate as the number of students, items, and skills increases [8]. Two alternative 
estimates, sum-scores [3,6] and the capability matrix [1], can be used to estimate student 
skill knowledge in (near to) real time. Estimates are subsequently clustered to identify 
similar skill set profiles. 

While initial theoretical work on sum-scores has been done [3], the behavior and per- 
formance of sum-scores and the capability matrix is not well understood in comparison 
with each other or with estimates from cognitive diagnosis models. The performance of 
the methods when missing values occur is also of interest. Moreover, which clustering 
method to employ is an open question. In this work we take a step back and compare 
the performance of three estimates of student skill knowledge under a variety of clustering 
methods. In Section 2, we describe the three different estimates of student skill knowledge. 
In Section 3, we give a brief introduction to the clustering methods used. In Section 4, 
we show results from a simulation study incorporating varying amounts of missing data. 
Finally, in Section 5, we offer conclusions and thoughts on future work. 

2 Estimates of Student Skill Knowledge 

While there may be several possible methods to estimate student skill knowledge, this 
paper will consider one traditional Bayesian estimation procedure and two simpler statis- 
tics. First, we introduce notation that will be common among the methods. We begin by 
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assembling the skill dependeneies of eaeh item into a Q-matrix [2,12]. The Q-maXnx, also 
referred to as a transfer model or skill eoding, is a 7 x matrix where qjk = 1 if item j 
requires skill k and 0 if it does not, J is the total number of items, and K is the total number 
of skills. The Q-matrix is usually an expert-elieited assignment matrix. This paper assumes 
the Q-matrix is known and eorreet. 

There are (at least) two ways in whieh Q-matriees ean differ. First, eaeh item eould 
require only a single skill or multiple skills. A Q-matrix ean then be eomprised of all 
single skill items, single and multiple skill items, or all multiple skill items. Seeond, the 
2-matrix may have a balaneed or unbalaneed design. In a balaneed design, all single skill 
items oeeur the same number of times, and eaeh eombination of skills oeeurs the same 
number of times. For example, if K = 3 and J = 30 one possible balaneed design would 
be: five single skill items for eaeh skill, four double skill items for eaeh pair of skills, and 
three triple skill items. A design eould be unbalaneed in two ways. Either all skills or 
eombinations of skills are present but do not oeeur the same number of times or there are 
missing skills or eombinations of skills. 
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We then assemble student responses in a A x 7 response matrix Y where y,y indieates 
both if student i attempted item j and whether or not they answered item j eorreetly and 
N is the total number of students. If student i did not answer item j then = NA. The 
indieator ly-j^^NA = 0 expresses this missing value. If student i attempted item j ( = 1), 
then jij = 1 if they answered eorreetly, or 0 if they answered ineorreetly. 

2.1 DINA Model Estimates 

The first method of estimating student skill knowledge uses a eommon eonjunetive 
eognitive diagnosis model. The deterministie inputs, noisy “and” gate model (DINA; [8]) 
models student responses as 

p(Ytj = 1 1 77, y, sj, gj) = (1 - (1) 

where att = fjStudent i has skill k] indieates if student i possesses skill k, rjij = Wk=i ^Ak 
indieates if student i has all skills needed for item j, sj = P(Yij = 0 | 77, y = 1) is the slip 
parameter, and gj = PiYij = 1 | 77, y = 0) is the guess parameter. If a student is missing any 
of the required skills, the probability that they will answer an item eorreetly drops due to 
the eonjunetive assumption. 

We estimate the student skill knowledge parameters of the DINA model, the aik, using 
Markov Chain Monte Carlo methods with the program WinBUGS (Bayesian Inferenee 
Using Gibbs Sampling, [9]). In the model, the aik are 0/1 indieating whether or not student 
i has mastered skill k. Our estimates will be aik 6 [0, 1]. We ean think of the &ik as the 
probability that student i has mastered skill k. 
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2.2 Sum-scores 

The second estimate we consider is the sum-score method of [3,6]. Here VT, = {Wn, 
Wa, Wik) is a vector of sum-scores where the component is defined as 

j 

^ ik ~ ^ j yijGjkt (2) 

j=i 

where ytj and qjk are the corresponding entries from the response matrix Y and Q-matrix. 
Thus, the components of VT, are simply the number of items student i answered correctly 
for each skill k. When an item requires more than one skill it will contribute to more than 
one component of VT,. The range of Wik may be different for each k if the skills are required 
by a different number of problems. 

2.3 Capability Matrix 

Finally, we consider the capability matrix defined in [1]. The capability matrix B is an 
N X K matrix where Bi^ is the proportion of correctly answered items involving skill k that 
student i attempted. Thus, 


Bik — 


j=l lyiji=NA ■ yij ' Qjk 
Yj j=l lyij*NA ■ Qjk 


(3) 


where y^ and qjj, are the corresponding entries from the response matrix Y and Q-matrix. 
The capability matrix expands on sum-scores by accounting for the number of items re- 
quiring skill k that student i answered. In this manner the statistic scales for the number of 
items in which the skill appears as well as for missing data. If a student has not seen all 
of the items requiring a particular skill, we still derive an estimate based on the available 
information. If student i completes no items involving skill k, then Bn, = NA. In this case, 
we impute an uninformative value (e.g., 0.5, mean, median) to map students to the hyper- 
cube. Exploring the performance of these imputation choices is ongoing. For this paper we 
assume that the data are complete or that missing 5- values are appropriately imputed. 


We can note that both the DINA model estimates and the 5-matrix values map students 
into a A'-dimensional hypercube (for each dimension, zero indicates total lack of skill mas- 
tery, one is complete skill mastery, and values in between are less certain). The 2^ comers 
of the hypercube correspond to natural skill set profiles C, = [Qi, Q 2 , ..., C,x}, e [0, 1}. 


Additionally, we can note theoretical connections between the sum-scores and 5-matrix 
values. If there are no missing response values ytj, then 


^ik Jk^ik. 


(4) 


where Jk is the number of items that require skill k. When all students have answered 
all questions and there is a balanced Q-matrix design (i.e., J\ = J 2 = .■■ = Jk), the two 
estimates will mapto the same (scaled) feature space. In this case, we expect the two 
estimates to perform similarly. However, when there is either missing data or an unbalanced 
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2-matrix design, the space to which the estimates map will be different. In this case, we 
cannot guarantee that performance will be similar. 

3 Clustering Methods 

To identify groups of students with similar skill set profiles, we cluster the student 
skill knowledge estimates. In this paper we will compare the performance of three com- 
mon clustering methods: hierarchical agglomerative clustering, K- means, and model-based 
clustering. In the sections below we briefly introduce each of these methods. 

3.1 Hierarchical Agglomerative Clustering 

Hierarchical agglomerative clustering (HAC; [10]) links groups in order of closeness to 
form a tree structure from which a clustering solution can be extracted. Euclidean distance 
is most commonly used to measure the distance between groups. The method also requires 
the user to specify how to measure the distance between groups. We will use “complete” 
linkage where the distance between any two groups is defined as the largest distance be- 
tween two observations, one from each group. In HAC, all observations begin as their own 
group. The two closest groups are merged and all inter-group distances are recalculated. 
We continue merging groups and recalculating distances until a single group with all ob- 
servations is formed. Once the tree structure is formed, we can extract the desired number 
of clusters G by cutting the tree at a height corresponding to G branches. 

3.2 K-means 

K-Means [5] is a popular iterative descent algorithm for data A = x. 6 

It uses squared Euclidean distance as a dissimilarity measure and tries to minimize within- 
cluster distance and maximize between-cluster distance. Eor a given number of clusters G, 
K-Means searches for cluster centers nig and assignments A that minimize the criterion 

G 

«=1 A(i)=g 

The algorithm alternates between optimizing the cluster centers for the current assign- 
ment (by the current cluster means) and optimizing the cluster assignment for a given set 
of cluster centers (by assigning to the closest current center) until convergence (i.e. clus- 
ter assignments do not change). It tends to find compact, spherical clusters and requires a 
priori both the number of clusters G and a starting set of cluster centers. The final clus- 
ter assignment can be sensitive to the choice of centers; a common method for initializing 
K-Means is to randomly choose G observations. 

3.3 Model-based Clustering 

Model-based clustering [4, 11] is a parametric statistical approach that assumes: the 
data X = {x^,X 2 , x^ 6 are an independently and identically distributed sample 

from an unknown population density p(^; each population group g is represented by a 
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Table 1 : Clustering the DINA Model Estimates of Student Skill Knowledge 


N 

J 

K 

2-matrix design 

DINA 

HAC 

K-means 

MBC 

MBC 2^^ 

250 

30 

3 

Single, bal 

1.000 

(0.0054) 

1.000 

(0.0054) 

0.8739 

(0.0736) 

0.9966 

(0.0895) 

1.000 

(0.0349) 

250 

30 

3 

Both, bal 

0.9793 

(0.0179) 

0.9781 

(0.0200) 

0.8367 

(0.1192) 

0.8915 

(0.0882) 

0.9632 

(0.1087) 

250 

30 

3 

Both,unbal, all 

0.9657 

(0.0285) 

0.9657 

(0.2920) 

0.7789 

(0.0941) 

0.9129 

(0.0505) 

0.9350 

(0.0758) 

250 

30 

3 

Both,unbal,miss 

0.9240 

(0.0395) 

0.9131 

(0.0427) 

0.7696 

(0.0858) 

0.8811 

(0.0696) 

0.9132 

(0.0428) 

250 

30 

3 

Mult, bal 

0.4677 

(0.0292) 

0.5127 

(0.0443) 

0.5012 

(0.0578) 

0.5282 

(0.0690) 

0.4979 

(0.0411) 

250 

30 

3 

Mult, unbal, all 

0.4629 

(0.0430) 

0.4874 

(0.0536) 

0.4948 

(0.0816) 

0.5130 

(0.0736) 

0.4790 

(0.0495) 

250 

30 

3 

Mult, unbal, miss 

0.3239 

(0.0380) 

0.4070 

(0.0596) 

0.3835 

(0.0521) 

0.4266 

(0.0837) 

0.4090 

(0.0630) 

500 

68 

5 

Both, bal 

0.9463 

(0.0184) 

0.9428 

(0.0188) 

0.7132 

(0.0428) 

0.8348 

(0.1123) 

0.9243 

(0.0488) 

500 

68 

5 

Both, unbal, miss 

0.8724 

(0.0247) 

0.8729 

(0.0219) 

0.6665 

(0.0466) 

0.8213 

(0.0960) 

0.8624 

(0.0226) 

300 

40 

7 

Single 

0.9041 

(0.0262) 

0.8891 

(0.0286) 

0.7674 

(0.0409) 

0.3050 

(0.1203) 

0.8881 

(0.0282) 


(often Gaussian) density and p{^ is a weighted mixture of these density components, 
i.e. p(x) = ' Pg^S^g) where = 1, 0 < < 1 for g = 1,2, and 

6g = (pg,'Eg) for Gaussian components. The method chooses the number of components 
G by maximizing the Bayesian Information Criterion (BIC) and estimates the means and 
variances (pg,'Zg) via maximum likelihood. While it may assume Gaussian components, its 
flexibility on their shape, volume, and orientation allows student groups of varying shapes 
and sizes. When multiple students map to the same location, model-based clustering is 
known to overfit the data by using spikes with near singular covariance in these locations 
[4]. To alleviate this concern, we jitter the student skill estimates by a small amount (0.01). 
The effect on our results is minimal. 

4 Simulation Study 

To compare the skill knowledge estimates and clustering methods described above we 
did a simulation study using generated data from the DINA model (Equation 1). The Q- 
matrix design is varied to include balanced and unbalanced combinations of single and 
multiple skill items. Then, for a fixed Q-matrix design, we simulate 20 different student 
populations. Skill difficulties are always set to equal medium difficulty; inter-skill correla- 
tions are set to zero. These choices evenly spread students among the 2^ natural skill set 
profiles [0, 1]^. Eor each student population, we generate true skill set profiles C,. We then 
draw slip and guess parameters from a random uniform distribution (sj ~ Unif(0,0.30); gj ~ 
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Table 2: Clustering the Sum-scores Estimates of Student Skill Knowledge 


N 

J 

K 

2-matrix design 

HAC 

K-means 

MBC 

MBC 2^ 

250 

30 

3 

Single, bal 

0.9910 

(0.0110) 

0.8549 

(0.0960) 

0.9191 

(0.2899) 

0.9957 

(0.0071) 

250 

30 

3 

Both, bal 

0.7644 

(0.1095) 

0.8156 

(0.1110) 

0.9321 

(0.1181) 

0.9442 

(0.0515) 

250 

30 

3 

Both,unbal, all 

0.6398 

(0.0889) 

0.7707 

(0.0951) 

0.6970 

(0.2138) 

0.8494 

(0.0713) 

250 

30 

3 

Both,unbal,miss 

0.6482 

(0.0511) 

0.6728 

(0.0650) 

0.7066 

(0.2064) 

0.7661 

(0.1095) 

250 

30 

3 

Mult, bal 

0.3950 

(0.0339) 

0.4720 

(0.0648) 

0.4383 

(0.0675) 

0.4375 

(0.0517) 

250 

30 

3 

Mult, unbal, all 

0.3862 

(0.0533) 

0.4606 

(0.0670) 

0.4380 

(0.0696) 

0.4481 

(0.0428) 

250 

30 

3 

Mult, unbal, miss 

0.2689 

(0.0273) 

0.2827 

(0.0848) 

0.3314 

(0.0352) 

0.3099 

(0.0347) 

500 

68 

5 

Both, bal 

0.4006 

(0.0560) 

0.5859 

(0.0442) 

0.5893 

(0.1223) 

0.6523 

(0.0432) 

500 

68 

5 

Both, unbal, miss 

0.4104 

(0.0373) 

0.54412 

(0.0366) 

0.6010 

(0.0537) 

0.6265 

(0.0397) 

300 

40 

7 

Single 

0.7348 

(0.0526) 

0.6474 

(0.0456) 

0.0973 

(0.0362) 

0.7080 

(0.0453) 


Unif(0,0.15)). Given profiles and slip/guess parameters, we generate the student response 
matrix Y. 

As we know the tme underlying skill set profiles C,, we can calculate their agreement 
with the clustering partitions using the Adjusted Rand Index (ARI; [7]), a common mea- 
sure of agreement between two partitions. The expected value of the ARI is zero and the 
maximum value is one, with larger values indicating better agreement. 

Tables 1, 2, and 3 show the clustering results for the DINA model estimates, sum- 
scores, and the capability matrix, respectively. In each table, N is the number of students, 
J is the number of items, and K is the number of skills. The Q-matrix design describes 
the Q-matrix used when generating the student responses (see Section 2 for more details). 
Here single indicates that there were only single skill items, both indicates that there were 
both single and multiple skill items, and mult indicates that there were only multiple skill 
items. Also, bal indicates that the Q-matrix had a balanced design. An unbalanced design 
is denoted by unbal and all or miss shows whether all combinations were present or if some 
were missing. For the DINA model estimates (Table 1), we rounded the or,^ to 0/1 to find the 
closest skill set profile. For the remaining methods in Table 1 and for all methods in Tables 2 
and 3 we cluster the unrounded atk. When using HAC and K-means, we set the number of 
clusters equal to 2^ as suggested by [3]. For MBC we search over an appropriate range; 
MBC 2^ indicates that we set the number of clusters to 2^. For each set of 20 simulations. 
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Table 3: Clustering the Capability Matrix Estimates of Student Skill Knowledge 


N 

J 

K 

2-matrix design 

HAC 

K-means 

MBC 

MBC 2^ 

250 

30 

3 

Single, bal 

0.9910 

(0.0104) 

0.8190 

(0.0835) 

0.9957 

(0.0071) 

0.9957 

(0.0071) 

250 

30 

3 

Both, bal 

0.7644 

(0.1095) 

0.7947 

(0.1056) 

0.9353 

(0.1583) 

0.9411 

(0.0300) 

250 

30 

3 

Both,unbal, all 

0.7273 

(0.0867) 

0.8082 

(0.1227) 

0.6252 

(0.1719) 

0.8281 

(0.1543) 

250 

30 

3 

Both,unbal,miss 

0.6698 

(0.0813) 

0.7390 

(0.0778) 

0.4563 

(0.1267) 

0.6693 

(0.1628) 

250 

30 

3 

Mult, bal 

0.4045 

(0.0347) 

0.4530 

(0.0508) 

0.4586 

(0.0624) 

0.4499 

(0.0382) 

250 

30 

3 

Mult, unbal, all 

0.3899 

(0.0509) 

0.4585 

(0.0550) 

0.4518 

(0.0822) 

0.4580 

(0.0589) 

250 

30 

3 

Mult, unbal, miss 

0.2700 

(0.0291) 

0.3638 

(0.0737) 

0.2803 

(0.0620) 

0.2840 

(0.0457) 

500 

68 

5 

Both, bal 

0.4096 

(0.0504) 

0.5711 

(0.0543) 

0.5951 

(0.1284) 

0.6647 

(0.0928) 

500 

68 

5 

Both, unbal, miss 

0.4327 

(0.0405) 

0.5435 

(0.0350) 

0.5560 

(0.2027) 

0.6291 

(0.1050) 

300 

40 

7 

Single 

0.7399 

(0.0545) 

0.6437 

(0.0402) 

0.0906 

(0.0168) 

0.7109 

(0.0409) 


we report the median ARI and the standard deviation. 

First, we examine performance differences across Q-matrix designs. The first Q-matrix 
has only three skills; each skill occurs in 10 single skill items. The ARI for all three meth- 
ods of estimation and all clustering methods is 1 in nearly all cases. Across the methods, 
K-means has the lowest ARI. This is not surprising as we randomly select 2^ = 8 observa- 
tions as the starting centers. A more informed set of starting centers (i.e., the natural skill 
set profiles) may lead to better performance. For the K = 3 examples, the ARI is higher 
when there are only single skill items compared to when there are both single and multi- 
ple skill items and only multiple skill items. The lone exception is MBC with sum-scores 
{Single, bal = 0.9191, Both, bal = 0.9321). The standard deviation in this case (0.2899) is 
rather large and indicates a wide range of ARI values for these 20 simulated datasets. 

We now take a closer look at Q-matrices with at least some multiple skill items. We can 
note that the performance of all three clustering methods is better (as indicated by a higher 
ARI) when there are both single and multiple skill items in the Q-matrix, compared to only 
multiple skill items (also tme across all three methods of estimation). In addition, when 
the 2-matrix has a balanced design, as opposed to an unbalanced design, the recovery of 
the true skill set profiles is better. In general, the performance of the three estimates of the 
student skill knowledge is similar across the clustering methods. This similar performance 
is particularly interesting since using sum-scores and the capability matrix yield large com- 
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Table 4: Clustering the DINA Model Estimates of Student Skill Knowledge for N = 
250, J = 30, K = 3 with Missing Response Data 


2-matrix design 

% missing 

DINA 

HAC 

K-means 

MBC 

MBC 2 '^ 

Both, bal 

0 

0.9793 

0.9781 

0.8367 

0.8915 

0.9632 

Both, bal 

10 

0.4584 

0.4690 

0.4750 

0.4725 

0.4754 

Both, bal 

20 

0.4326 

0.4550 

0.4581 

0.4544 

0.4567 

Both, bal 

30 

0.4006 

0.4340 

0.4276 

0.4267 

0.4306 

Both, bal 

40 

0.3513 

0.3825 

0.3850 

0.3655 

0.3681 

Both, unbal, miss 

0 

0.9240 

0.9131 

0.7696 

0.8811 

0.9132 

Both, unbal, miss 

10 

0.9084 

0.9057 

0.7516 

0.8274 

0.8009 

Both, unbal, miss 

20 

0.8775 

0.8651 

0.7294 

0.7560 

0.7578 

Both, unbal, miss 

30 

0.8193 

0.8160 

0.7256 

0.7052 

0.6948 

Both, unbal, miss 

40 

0.7694 

0.7746 

0.7181 

0.6515 

0.6114 


Table 5: Clustering the Sum-Score Estimates of Student Skill Knowledge for N = 250, J = 
30, K = 3 with Missing Response Data 


2-matrix design 

% missing 

HAC 

K-means 

MBC 

MBC 2^ 

Both, bal 

0 

0.7644 

0.8156 

0.9321 

0.9442 

Both, bal 

10 

0.6255 

0.7671 

0.8280 

0.8489 

Both, bal 

20 

0.5000 

0.6717 

0.4854 

0.7526 

Both, bal 

30 

0.4191 

0.5855 

0.4131 

0.5309 

Both, bal 

40 

0.3168 

0.5072 

0.2951 

0.3867 

Both, unbal, miss 

0 

0.6482 

0.6728 

0.7066 

0.7661 

Both, unbal, miss 

10 

0.5744 

0.6091 

0.3608 

0.6563 

Both, unbal, miss 

20 

0.4834 

0.5556 

0.3264 

0.5414 

Both, unbal, miss 

30 

0.3686 

0.4876 

0.2725 

0.3961 

Both, unbal, miss 

40 

0.3266 

0.4203 

0.2514 

0.2624 


putational savings when compared to estimating the DINA model using WinBUGS (up to 
700 times faster; [1]). Moreover, in this simulation study the data are generated from the 
DINA model; we would expect the Bayesian estimation to perform well in this best-case 
scenario. Eor sum-scores and the capability matrix to perform as well as, and better than in 
some cases, the DINA model is noteworthy. 

The above results are for student populations with complete response data. In practice, 
missing responses (unanswered questions) will be ubiquitous. We chose two Q-matrix 
designs with N = 250, J = 30, and K = 3 (Both, bal and Both, unbal, miss) and removed 
0, 10, 20, 30, and 40% of the student responses completely at random for each of the 
20 student populations. Results can be seen in Tables 4, 5, and 6. Note that the 0% 
missing corresponds to the previously shown results. Again, we report the median ART 
The standard deviations are not shown due to space limitations. They ranged from 0.03 to 
0.16 and were generally ordered as DINA model (lowest), capability matrix, and sum-score 
(highest). 
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Table 6: Clustering the Capability Matrix Estimates of Student Skill Knowledge for N = 
250, J = 30, K = 3 with Missing Response Data 


2-matrix design 

% missing 

HAC 

K-means 

MBC 

MBC 2^ 

Both, bal 

0 

0.7644 

0.7947 

0.9353 

0.9411 

Both, bal 

10 

0.6682 

0.7894 

0.6633 

0.8786 

Both, bal 

20 

0.6028 

0.7491 

0.5350 

0.7655 

Both, bal 

30 

0.6022 

0.7141 

0.5021 

0.5505 

Both, bal 

40 

0.4842 

0.6103 

0.3948 

0.4086 

Both, unbal, miss 

0 

0.6698 

0.7390 

0.4563 

0.6693 

Both, unbal, miss 

10 

0.6032 

0.6980 

0.4766 

0.5473 

Both, unbal, miss 

20 

0.5761 

0.6629 

0.4687 

0.4654 

Both, unbal, miss 

30 

0.5351 

0.6251 

0.4764 

0.4775 

Both, unbal, miss 

40 

0.5108 

0.5658 

0.4144 

0.4335 


In general, as the amount of missing data increases, the ARI decreases across all three 
estimation methods and all methods of clustering. However, some methods show more 
substantial decreases than others. When using the capability matrix, K-means shows rel- 
atively stable performance for both Q-matrix designs. For the Both, unbal, miss design, 
HAC and MBC also show stable performances. When using sum-scores, the performance 
drops more noticeably across all clustering methods which may reflect that the capability 
matrix scales for the number of questions answered while sum-scores do not. In the Both, 
bal case, the performance of the capability matrix estimates is generally better than both 
the DINA model estimates and the sum-scores (particularly true for K-means). For HAC, 
sum-scores and the capability matrix perform similarly (both better than the DINA model 
estimates). For the Both, unbal, miss case, the performance of the DINA model estimates is 
better than both sum-scores and the capability matrix estimates. When using the capability 
matrix estimates, K-means clustering performs best; its ARI values are only slightly lower 
than those of the DINA model. 

5 Conclusions 

Simulated examples show that recovery of the true skill set profiles is best when only 
single skill items occur. For Q-matrices with multiple skill items, recovery is improved if 
there are also single skill items present. These results hold across all three clustering meth- 
ods and all three estimates of student skill knowledge. In addition, we note that the more 
computationally attractive capability matrix and the sum-score estimates perform similarly 
to the Bayesian estimation of the DINA model. 

However, when there are missing responses, the performance of the estimation proce- 
dures changes. In general, the ARI values decrease as the percent of missingness increases 
(across all estimation and clustering methods). When the Q-matrix has a Both, bal design, 
the capability matrix estimates perform better than both the DINA model and sum-score 
estimates. In the Both, unbal, miss design, the DINA model estimates perform better than 
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sum-scores and the capability matrix estimates. 

These results can be used to guide the design of exams and tutor problems. For better 
estimation of student skill knowledge, single skill items should be included for each skill. 
In addition, students should be encouraged to finish all items. Whether or not it is by 
design, when students use online tutors, for example, they often do not complete all the 
items. In this case, it is particularly important for single skill items to be included. In the 
presence of missing responses, however, care should be taken when choosing an estimation 
method and a clustering method. The best choice is not obvious. 

While there are benefits of using the capability matrix and/or sum-scores, we note that 
if an item requires multiple skills and a student answers incorrectly, all skills required by 
the item will receive a penalty, even if the student has mastered one (or more) of the skills. 
In future work, we will explore the behavior of alternative estimates that better account 
for multiple skill items. Possible methods could use empirical performance on single skill 
items or weight by the number of skills required by the incorrectly answered item. 
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